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Summary . 

This  paper  shows  how  four  statistics  (Kolmogorov-Smirnov,  Cramer- 
von  Mises,  and  the  Kuiper  and  Watson  extensions)  may  be  used  to  test 
whether  a  given  sample  comes  from  the  exponential  distribution  with, 
unknown  parameter.  Simple  modifications  of  the  basic  definitions  make 
it.  possible  to  use  each  statistic  with  only  one  line  of  percentage 
points:  in  turn,  these  may  be  reduced  to  chi-square  points.  The  tests 
are  powerful  than  the  usual  Pearson  chi-square  test,  and  are  very  well 
adapted  for  use  with  a  computer. 


1.1  Introduction, 


Suppose  a  random  sample  consists  of  n  values  x  ,xn,...,xn  . 
We  wish  to  test  the  null  hypothesis  Hq:  the  sample  comes  from  the 
exponential  distribution,  with  distribution  function  and  density; 


(1) 


F(x)  «  l-e~tx  i  f(x)  r.  Pe" 


x  >  0 


This  work  was  supported  also  by  the  Nation  1  Research  Council  of 
Canada. 
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The  parameter  6  is  not  known,  and  will  be  estimated  by  the  maximum 

A 

likelihood  estimator  9  -  1/x  ,  The  tests  given  will  us  Kolmogorov - 
t.ype  statistics,  i.e,  those  based  on  a  measure  of  the  difference  between 
the  sample  (or  empirical)  distribution  function  F  (x)  and  the  hypothe¬ 
sised  distribution  function  F(x).  We  shall  consider  four  of  these 

statistics,  usually  known  uy  D  (Kolmogorov-Smirnov),  (Ciumer- 

2 

von  Mises),  V  (Kuiper)  and  U  (Watson))  customarily,  they  are 
given  a  suffix  n  to  show  the  dependence  of  their  distributions  on 
sample  size,  but  this  will  be  omitted. 

1.2  Null  Distributions  of  Kolmogorov - type  Statistics. 

Kolmogorov- type  statistics  are  used  to  test  whether  a  random 

sample  comes  from  a  given  distribution;  let  the  distribution  function 

be  G(x),  to  distinguish  from  the  special  F(x)  defined  in  (l).  By 

null  distribution  is  meant  the  distribution  of  the  test  statistic  when 

the  null  hypothesis  is  true.  It  is  well  known  that  when  G(x)  is 

completely  specified,  the  null  distributions  of  the  four  n uatistics 

above  do  not  depend  on  G(x),  but  only  on  sample  size  n;  these 

distributions  have  all  been  tabulated,  so  that  the  goodness-of-fit 

test  is  available.  Further,  the  statistics  have  recently  been  modified 

to  remove  the  dependence  on  sample  size  (Stephens,  1970).  When  G(x) 

contains  one  or  more  parameters  which  must  be  estimated  from  the 

sample,  the  null  distributions  are  changed,  and  the  standard  percentage 
2  2 

points  of  D,  W  ,  V  and  U  do  not  apply.  It  has  been  shown  that,  for 
certain  types  of  parameter,  and  for  certain  estimators,  the  null 
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distribution  will  depend  on  the  family  of  distributions  specified  by 
G(x),  but  not  on  the  specific  true  parameter  values  (Darling,  1955)* 

This  will  be  so  for  the  situation  treated  in  this  paper,  where  6  in 

A 

(1)  is  a  scale  parameter  and  @  is  the  maximum  likelihood  estimator. 
Nevertheless,  the  exact  null  distributions  of  the  test  statistics  are 
st-»l!>  difficult  to  find)  this  paper  gives  Monte  Carlo  results  for  the 
percentage  points.  Modificationsof  the  test  statistics  are  also  given; 
the  modified  test  statistics  each  require  only  one  line  of  percentage 
points,  independent  of  n.  These  in  turn  may  be  reduced  to  values  in  a 
chi-square  table.  Results  for  the  statistic  D  have-  been  given  also 
by  Lilliefors  (1969)* 

1.3  Practical  Considerations. 

It  has  been  well  vnown  that  Kolmogorov -type  statistics  possess  good 
power  properties  compared  with  the  Pearson  chi-square  statistic; 
difficulty  of  calculation,  together  with  the  fact  that  G(x)  had  to  be 
completely  specified,  has  presumably  inhibited  their  use  until  now.  For 
the  present  application  there  are  several  merits  to  the  statistics: 

(a)  the  difficulty  of  estimating  the  parameter  has  been  removed; 

(b)  the  power  properties  will  still  be  good  (see  section  2.9)}  (c)  with 
a  computer  routine,  the  statistics  are  easy  to  calculate,  and,  with  the 
modifications  removing  the  need  for  long  tables  of  percentage  points, 
the  tests  become  extremely  easy  to  apply.  Similar  remarks  apply  to 
testing  for  the  normal  distribution  when  parameters  are  not  known; 
recent  work  on  this  subject  is  in  Lilliefors  (1967)  and  Stephens  (1969a)* 


5 


In  section  2  the  test  of  Hq  is  set  out.  The  formulas  given  for 

c’  2 

D,  W  ,  V  and  U  come  from  their  definitions,  with  the  estimate  of 

9  used  in  F(x).  The  modifications  are  then  given,  and  the  percentage 
points  of  the  modified  forms  are  in  Table  1.  These  percentage  points 

are  the  points  for  the  asymptotic  distributions  of  Vn  D,  W  ,  Vn  V,  j 

2 

U  ,  assuming  Hq  true  and  the  estimate  of  d  used.  It  is  possible 

2 

to  get  some  theoretical  results  on  the  asymptotic  distributions  of  W 

and  tT  and  these  are  used  to  give  X^  approximations  to  the  percentage 

2 

points)  similar  X  approximations  are  given  also  for  D  and  V.  A 
short  table  of  smootned  Monte  Carlo  points  for  the  unmodified  statistics 
is  included;  comparison  may  then  be  made  with  the  results,  for  D,  of 
Lilliefors  (1969). 

2.  Kolmogorov- type  Statistics:  Modifications  for  Testing  for 

The  Exponential  Distributions. 

2.1.  The  test  is  of  Hq:  that  a  given  random  sample  of  size  n  comes 
from  F(x)  =  1-e  tX,  9  unitnown.  For  all  the  four  statistics  we  first 
follow  tnese  steps. 

(a)  Assume  the  x^,  i-l,2,...,n,  are  in  ascending  order. 

(b)  Calculate  x,  the  mean  of  the  sample,  and  ‘V  values 

,Vi  ••  »n  . 

(c)  Calculate  Zj  -  i-exp(-yj),  i-1,2,  ...,n  . 

Tne  four  statistics  are  calculated  from  ^he  z  values. 


2.2  The  Kolmogorov  Statistic  D. 

(1)  Calculate  D+  =  max  (l/n-z  ),  D"  =  max  (z  -(l-l)/n)  and 

i  3  i  1 

D  -  max  (D  ,D  ' 

(2)  Mcxiification.  Calculate 

D*  -  (D-0.2/n)  (Vn  +  0.26  +  O.^/Vn) 

(3)  Test  of  Hq.  Compare  D*  with  its  upper  tail  percentage 
points  given  in  Table  1:  if  D*  exceeds  a  given  value, 
reject  Hq  at  the  corresponding  significance  level. 

2 

2.3  The  Cramer-von  Mises  Statistic  W  . 

(1}  Calculat  e  W2  =  2.  (z^  (2i-l)/2n)"  +  l/(l2n). 

(2)  Modi ficat ion-  Calculate  W*  -  Wc(l+0.l6/n) . 

(3)  Test  of  Hq.  Compare  W*  with  its  upper  tail  percentage 
points, given  in  Table  1. 

2 . 4  The  Kulpe~  Statistic  V . 

(1)  Calculate  T  ,  l1'  ns  In  section  2.2,  and  V  «  D  +  D  . 

(2)  Modifies-  ion.  Calculate 

V*  ( V-0 - 2/ n ) {  /n  ♦  C.2U  +  0. 3t>/Vn) 

(3)  Test  of  H  •  compare  V*  with  its  upper  tail  percentage 


points,  given  in  Table  1. 


2.5  The  Watson  Statistic  U^. 

(1)  Calculate  ,  as  in  section  2.3,  and  then  U2  =  -n(z  -  i)  , 

where  z  is  the  mean  of  z  ,  i.e.,  z  =  £  z  /n  . 

1  i  1 

(2)  Modification.  Calculate  U*  =  U2  (l+0.l6/n)  . 

(3)  Test  of  Hq.  Compare  U*  with  its  upper  tail  percentage 
points,  in  Table  1. 


Table  of  Percentage  Points. 


The  percentage  points  for  each  statistic,  for  values  of  n  =  6,8, 
10,16,20,  k), 50,60,80,100,  were  found  by  drawing  Monte  Carlo  samples 
from  f(x)  =  e  ,  and  then  calculating  the  statistics.  10,000  samples 
were  drawn  for  each  n.  The  percentage  points  for  Vn  D  were  plotted 
against  l/n,  and  extrapolated  to  l/n  =  0  to  give  the  asymptotic 
points  for  Vn  D;  these  are  the  same  as  those  for  D*,  quoted  in  Table 
1.  Similarly  for  the  other  statistics)  the  points  in  Table  1  are  the 
asymptotic  points  for  W*",  i/nV.  and  U^.  The  actual  percentage  points, 
at  the  54  and  14  level,  obtained  from  the  smoothed  graphs,  are  given  in 
Table  2.  Those  for  Vn  D  may  be  compared  with  those  for  D  in 
Lilliefors  (l£>9)-  They  give  excellent  agreement  for  low  values  of  r.| 
for  higher  n,  l.llliefors'  asymptotic  values  are  lower  than  those  in 
Table  1,  but.  are  based  on  samples  not  larger  than  35*  In  Table  3,  we 
give  a  table  of  estimated  moments  of  the  distributions)  for  a  statistic, 
say  T,  we  give  =  (sum  of  10,000  T-values)/l0,000,  and  similarly 
=  T.  1^/10,000,  for  k  «  2,  3  and  U,  These  will  be  of  interest  if  htiy 
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nt.tempt  can  be  made  on  the  exact  distributions  of  the  four  statistics. 

2.7  Modifications . 

The  modifications  effectively  five  approximations  for  the  percentage 
points  of  a  statistic;  for  example,  getting  D*  -  0.99  and  solving 
for  D,  for  any  n,  finds  an  approximation  to  the  1CW  point  for  D 
at  that  value  of  n.  Table  4  compares  the  approximations  with  the  smoothed 
Monte  Carlo  values.  If  a’  is  '  lie  true  significance  level  attained  by 
an  approximate  point  calculated  for  level  a,  ‘he  error  hi' -a!  can 
be  seen  to  be  negligible. 

2.8  Chi-square  Approximations  to  True  Asymptotic  Ftoints. 

An  excellent  approximation  to  the  percentage  points  for  T*.  given 
in  Tabie  1,  is 


(?) 

r*(o) 

0.01'  -  0  0545  \\  (  |)  . 

Wit  re 

r*  v  *  i  and 

are  t  >;e  tipper  tail  per°en‘ age  poi 

m 

to 

level 

i.  of  r* 

and  of 

r', 

\~  with  00  degrees  of  freedom. 

Ouch  an 

approx ima’ lot:  is  • 

tseful  for  computer  work;  given  a  sample. 

H.  is 
o 

t  est  cd 

by  calcula' 

ing  !\ 

•hen  V\  atid  then  r  vT’-O.-'l 

'■  is  t 

!  r.er.  out  put 

and  refe 

rred  to  the  upper  ’ail  of  ‘he  V, , 

table. 

.‘hi -square  approximation®  are  also  useful  coral  tr.a*  ions  of  ‘eats. 

For  the  approximat  ion  (2),  t..e  degrees  of  freedom  of  X"  was 
chosen  to  »ive  "he  curvature  in  the  tail  'lose  ‘o  that  of  I*,  Otrietly, 
XT0  is  slightly  better;  but  t  :>.c  T* 


is  derived  from  Mor.ve  arlo  results, 


and 


is  often  not  tabulated,  so 


was  used.  The  constants 


is  often  not  tabulated,  so 

0.017  and  0.0343  were  found  by  matching  the  5^  and  points. 

Table  5  contains  the  percentage  points  given  by  this  approximation 
P  2 

and  those  for  V,  W  and  U  which  follow.  Comparison  with  the  Monte 
Carlo  points,  from  Table  1,  shows  that  they  are  all  very  good.  The  V* 
approximation,  obtained  as  for  D*,  is 


(3) 


v*(a)  =  -0.336  0.0295  x^Q(a) 


2  2 

2.8  For  the  W  and  U  statistics,  further  information  is  available 

2 

on  the  asymptotic  distributions;  the  mean  p  and  variance  o  may  be 

found  exactly  by  methods  of  L\. cling  (1955 )•  Darling  gives,  for  the 

2  2 

asymptotic  distribution  of  W  ,  u  =  0.09259  and  a  =  0.004357; 
similar  calculations  for  U2  rive  =  0.071?6  and  =  0.001 98 58 
(Stephens,  1969b).  This  information  may  be  incorporated  to  give 
a  +  bX^  approximations  in  several  ways;  for  a  full  discussion,  see 
Stephens  (1969a),  where  the  technique  was  applied  in  connection  with 
tests  for  normality.  We  give  here  the  approximations  obtained  by 
choosing  p  as  before,  and  then  matching  the  mean  and  the  5$  point. 
They  are: 

(4)  W*(a)  =  0.04 60  +  0.0466  x^(a)  ; 


(5) 


u*(a)  =  0,0265  +  0.0266  x^(a) 
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Percentage  points  given  by  these  approx imaiior.s  are  in  Table  4,  together 
wi ? u  the  means  and  variances,  i'ne  latter  compare  excellently  with  the 
exact  values  quoted  above. 

2 . 9  Power  of  the  Test; . 

it  has  been  mentioned  that  Kolmogorov-type  statistics  would  be 
expected  to  be  more  powerful  than  the  usual  Pearson  chi-square  statis¬ 
tic  in  the  situation  considered  here.  Lilliefore  (19o9)  has  confirmed 

this,  for  the  statistic  T,  and  nas  also  given  some  comparisons,  for 

2 

D,  when  the  distribution  of  the  sample  is  actually  or  lognormal. 

We  have  supplemented  Tjiiliefors’  results  by  also  taking  Monte  Carlo 
samples  from  these  two  distributions,  so  that  the  four  statistics  may 
be  compared.  Samples  were  also  tstcen  from  the  half-normal  distribution; 
i.e.,  x  was  cnosen  from  a  N(0,1)  population  and  the  absolute  value 

of  x  used  as  the  sample  observation.  Results  are  given  in  Table  6. 

2  2  P 

■v  seems  a  better  statistic  than  B,  and  (J  than  V.  bince  w 

and  if  are  essentially  a  measure  of  tne  1  sum"  of  tne  discrepancies 

between  F  (x)  end  F  x  i  at  every  point,  they  might  be  expected  to 

detect  more  subtle  departures  from  the  null  hypothesis  than  D  or  V; 

2  2 

when  C  is  better  than  W  is  itself  an  interesting  question.  There 
are,  of  course,  many  other  ways  of  testing  for  exponentiality;  other 
power  comparisons  are  being  made  and  will  be  published  separately. 
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TABLE  1 


Upper  tail 

percentage 

points  of  modified  Kolmogorov- 

type  statistics 

7»  level 

Z' 

"'=  "'i 

Statistic 

10 

5 

2.5 

1 

D* 

0.990 

1.094 

1.190 

1.308 

W* 

0.178 

0.225 

0.276 

0.349 

V* 

1.527 

1.655 

1.774 

1.910 

u* 

0.131 

0.16? 

0.193 

0.233 

11 


TABLE  2 


2  2 

57,  and  I 7.  Upper  tail  percentage  points  for  JnD,  W  ,  VnV,  and  U 
for  use  in  testing  for  exponentiality  when  the  scale  parameter  must 
be  estimated. 


n 

57. 

■JnD 

17. 

57. 

w2 

17. 

JnV 

57. 

17. 

57. 

u2 

17. 

6 

1.006 

1.174 

0.216 

0.317 

1.733 

0.158 

0.224 

8 

1.017 

1.197 

0.219 

.325 

■ 

1.537 

1.757 

.159 

.226 

10 

1.025 

1.212 

.220 

.330 

1.551 

1.776 

.159 

.227 

12 

1.033 

1.223 

.221 

.334 

1.562 

1.790 

.160 

.228 

15 

1.042 

1.235 

.222 

.337 

1.574 

1.808 

.160 

.229 

20 

1.052 

1.248 

.223 

.340 

1.587 

1.828 

.160 

.  230 

25 

1.058 

1.258 

.224 

.342 

1.597 

1.840 

•.161 

.231 

30 

1.064 

1.264 

.224 

.344 

1.604 

1.838 

.161 

.231 

40 

1.070 

1.274 

.224 

.345 

1.614 

1.861 

.161 

.231 

50 

1.074 

1.278 

.225 

.346 

1.621 

1.868 

.161 

.232 

100 

1.083 

1.291 

.225 

.348 

1.638 

1.889 

.162 

.233 

12 


TABLE  4 


Comparison  of  Monte  Carlo  and  approximate  percentage  points 
for  four  Statistics _ _ 


The  values  given  are  for  the  5^  and  l/£  Upper  tail  percentage  points 


0  0 
Statistic  \/nD  W  JnV  u 


n 

1  Level: 

5 

1 

5 

1 

5 

1 

5 

1 

10 

M.C. 

1.025 

1.212 

0.220 

.330 

1.551 

1.776 

0.159 

0.227 

Approx. 

1.030 

1.219 

.221 

.343 

1.553 

1.783 

.159 

.229 

20 

M.C. 

1.05-2 

1.248 

.223 

.340 

1.587 

1.828 

.160 

.230 

Approx. 

1.055 

1.252 

.223 

.346 

1.590 

1.828 

.161 

.231 

50 

M.C. 

1.074 

1.278 

.225 

.346 

1.621 

.  1.868 

.161 

.232 

Approx. 

1.07 

1.28 

.224 

.348 

1.62 

1.87 

.161 

.232 

100 

M.C. 

1.083 

1.291 

.225 

.348 

1.638 

1.889 

.162 

.233 

Approx. 

1.08 

1.29 

.225 

.348 

1.64 

1.89 

.162 

.233 

14 


2 

a 


5 


2.5 


0.70 

0.047 

0.991 

1.094 

1.190 

0.0926 

0.00434 

0.172 

0.225 

0.280 

1.14 

0.087 

1.528 

1.655 

1.771 

0.0718 

0.00204 

0.131 

0.162 

0.193 

1.308 

0.355 

1.910 

0.233 
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TABLE  6 

Power  Comparisons 


The  table  gives  the  percentage  of  1000  samples  significant,  when  the  cest  for 
exponent iality  was  applied  at  the  107.  level,  and  the  true  distribution  is  as. 
shown,  n  is  the  number  in  each  sample. 


Sample  Size 

Statistic  i 

W 

V 

U 

n 

Distribution 

D 

10 

4' 

316 

349 

291 

302 

20 

4 

545 

599 

473 

498 

10 

lognormal 

170 

171 

155 

173 

20 

lognormal 

206 

213 

197 

229 

10 

ha If normal 

201 

216 

184 

200 

20 

half normal 

305 

337 

257  . 

281 

16 
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